Selection Methods for Multidimensional Datasets
نویسندگان
چکیده
One of the most challenging problems encountered when analysing multidimensional datasets is overabundance of features. In order to remove irrelevant, redundant, and noisy information from the data, some feature selection algorithms are used. The two-level feature selection analysed method contains Wrapper and Filter based feature search and evaluation. The experiments were performed on a database with 257 attributes and 593 instances, containing the UAB graduates answers to a questionnaire. The results showed that the two-level feature selection improved the time taken for build the classification models for multidimensional datasets, and, in some cases, improved also accuracy rates. 2010 Mathematics Subject Classification: 62H30, 68T99.
منابع مشابه
Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets
Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملAn Efficient Feature Subset Selection Algorithm for Classification of Multidimensional Dataset
Multidimensional medical data classification has recently received increased attention by researchers working on machine learning and data mining. In multidimensional dataset (MDD) each instance is associated with multiple class values. Due to its complex nature, feature selection and classifier built from the MDD are typically more expensive or time-consuming. Therefore, we need a robust featu...
متن کاملA Comparison of Two Strategies for Scaling Up Instance Selection in Huge Datasets
Instance selection is becoming more and more relevant due to the huge amount of data that is constantly being produced. However, although current algorithms are useful for fairly large datasets, many scaling problems are found when the number of instances is of hundred of thousands or millions. Most instance selection algorithms are of complexity at least O(n), n being the number of instances. ...
متن کامل